stable matching
Match Made with Matrix Completion: Efficient Learning under Matching Interference
Tang, Zhiyuan, Chen, Wanning, Xu, Kan
Matching markets face increasing needs to learn the matching qualities between demand and supply for effective design of matching policies. In practice, the matching rewards are high-dimensional due to the growing diversity of participants. We leverage a natural low-rank matrix structure of the matching rewards in these two-sided markets, and propose to utilize matrix completion to accelerate reward learning with limited offline data. A unique property for matrix completion in this setting is that the entries of the reward matrix are observed with matching interference -- i.e., the entries are not observed independently but dependently due to matching or budget constraints. Such matching dependence renders unique technical challenges, such as sub-optimality or inapplicability of the existing analytical tools in the matrix completion literature, since they typically rely on sample independence. In this paper, we first show that standard nuclear norm regularization remains theoretically effective under matching interference. We provide a near-optimal Frobenius norm guarantee in this setting, coupled with a new analytical technique. Next, to guide certain matching decisions, we develop a novel ``double-enhanced'' estimator, based off the nuclear norm estimator, with a near-optimal entry-wise guarantee. Our double-enhancement procedure can apply to broader sampling schemes even with dependence, which may be of independent interest. Additionally, we extend our approach to online learning settings with matching constraints such as optimal matching and stable matching, and present improved regret bounds in matrix dimensions. Finally, we demonstrate the practical value of our methods using both synthetic data and real data of labor markets.
- North America > United States > Texas (0.04)
- North America > United States > Arizona (0.04)
- Asia > Middle East > Jordan (0.04)
Equitable Stable Matchings in Quadratic Time
Can a stable matching that achieves high equity among the two sides of a market be reached in quadratic time? The Deferred Acceptance (DA) algorithm finds a stable matching that is biased in favor of one side; optimizing apt equity measures is strongly NP-hard. A proposed approximation algorithm offers a guarantee only with respect to the DA solutions. Recent work introduced Deferred Acceptance with Compensation Chains (DACC), a class of algorithms that can reach any stable matching in O(n^4) time, but did not propose a way to achieve good equity. In this paper, we propose an alternative that is computationally simpler and achieves high equity too. We introduce Monotonic Deferred Acceptance (MDA), a class of algorithms that progresses monotonically towards a stable matching; we couple MDA with a mechanism we call Strongly Deferred Acceptance (SDA), to build an algorithm that reaches an equitable stable matching in quadratic time; we amend this algorithm with a few low-cost local search steps to what we call Deferred Local Search (DLS), and demonstrate experimentally that it outperforms previous solutions in terms of equity measures and matches the most efficient ones in runtime.
Optimal Analysis for Bandit Learning in Matching Markets with Serial Dictatorship
The problem of two-sided matching markets is well-studied in computer science and economics, owing to its diverse applications across numerous domains. Since market participants are usually uncertain about their preferences in various online matching platforms, an emerging line of research is dedicated to the online setting where one-side participants (players) learn their unknown preferences through multiple rounds of interactions with the other side (arms). Sankararaman et al. provide an $Ω\left( \frac{N\log(T)}{Δ^2} + \frac{K\log(T)}Δ \right)$ regret lower bound for this problem under serial dictatorship assumption, where $N$ is the number of players, $K (\geq N)$ is the number of arms, $Δ$ is the minimum reward gap across players and arms, and $T$ is the time horizon. Serial dictatorship assumes arms have the same preferences, which is common in reality when one side participants have a unified evaluation standard. Recently, the work of Kong and Li proposes the ET-GS algorithm and achieves an $O\left( \frac{K\log(T)}{Δ^2} \right)$ regret upper bound, which is the best upper bound attained so far. Nonetheless, a gap between the lower and upper bounds, ranging from $N$ to $K$, persists. It remains unclear whether the lower bound or the upper bound needs to be improved. In this paper, we propose a multi-level successive selection algorithm that obtains an $O\left( \frac{N\log(T)}{Δ^2} + \frac{K\log(T)}Δ \right)$ regret bound when the market satisfies serial dictatorship. To the best of our knowledge, we are the first to propose an algorithm that matches the lower bound in the problem of matching markets with bandits.
- Asia > Middle East > Jordan (0.04)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences
Hosseini, Hadi, Khanna, Samarth, Singh, Ronak
The rise of Large Language Models (LLMs) has driven progress in reasoning tasks -- from program synthesis to scientific hypothesis generation -- yet their ability to handle ranked preferences and structured algorithms in combinatorial domains remains underexplored. We study matching markets, a core framework behind applications like resource allocation and ride-sharing, which require reconciling individual ranked preferences to ensure stable outcomes. We evaluate several state-of-the-art models on a hierarchy of preference-based reasoning tasks -- ranging from stable-matching generation to instability detection, instability resolution, and fine-grained preference queries -- to systematically expose their logical and algorithmic limitations in handling ranked inputs. Surprisingly, even top-performing models with advanced reasoning struggle to resolve instability in large markets, often failing to identify blocking pairs or execute algorithms iteratively. We further show that parameter-efficient fine-tuning (LoRA) significantly improves performance in small markets, but fails to bring about a similar improvement on large instances, suggesting the need for more sophisticated strategies to improve LLMs' reasoning with larger-context inputs.
- Europe > Austria > Vienna (0.14)
- North America > Canada > British Columbia > Vancouver (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
- Transportation > Passenger (0.48)
- Transportation > Ground > Road (0.34)
- North America > United States > California (0.04)
- North America > United States > Texas (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Asia > Middle East > Jordan (0.04)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (2 more...)
- Asia > Middle East > Jordan (0.04)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Education (1.00)
- Banking & Finance (0.88)
- North America > United States (0.14)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
- (2 more...)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Hungary (0.04)
- Europe > Germany (0.04)
- (9 more...)
cb70ab375662576bd1ac5aaf16b3fca4-AuthorFeedback.pdf
We thank all reviewers for the time they invested to review this paper and share their insights. We have conducted experiments on real-world data, yet could not include them within page limits. Publication of the algorithm in an implemented code (e.g. Java as stated in Line 304). The pseudocodes are given below.